Search CORE

81 research outputs found

Catálogo de herramientas informáticas relacionadas con la creación, gestión y explotación de corpus textuales

Author: Vivaldi Palatresi Jorge
Publication venue
Publication date: 01/01/2009
Field of study

Els corpus són recursos lingüístics importants que permeten obtenir una gran quantitat d'informació sobre l'ús real de la llengua. Aquest treball mostra els recursos més importants en aquesta àrea, ja sigui en forma de corpus ja compilats o de recursos informàtics que en faciliten la compilació, el processament i l'explotació.Los corpus son recursos lingüísticos importantes que permiten obtener gran cantidad de información sobre el uso real de la lengua. Este trabajo muestra los recursos más importantes en esta área bien sea en forma de corpus ya compilados como de recursos informáticos que facilitan su compilación, procesamiento y explotación.Corpora are important linguistic resources from which it is possible to obtain a great deal of information on real language use. This piece of work looks at the main resources in the field in question, encompassing both precompiled corpora and IT resources for their compilation, processing and exploitation

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Diposit Digital de Documents de la UAB

Spanish named entity recognition in the biomedical domain

Author: Cotik Viviana
Rodríguez Hontoria Horacio
Vivaldi Palatresi Jorge
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

Named Entity Recognition in the clinical domain and in languages different from English has the difficulty of the absence of complete dictionaries, the informality of texts, the polysemy of terms, the lack of accordance in the boundaries of an entity, the scarcity of corpora and of other resources available. We present a Named Entity Recognition method for poorly resourced languages. The method was tested with Spanish radiology reports and compared with a conditional random fields system.Peer ReviewedPostprint (author's final draft

UPCommons. Portal del coneixement obert de la UPC

A descriptive study about Wordnet (MCR) and linguistics synsets

Author: Fathi Besharat
Vivaldi Palatresi Jorge
Publication venue: Red Iberoamericana de Terminología - RITERM
Publication date: 19/12/2014
Field of study

Este artigo apresenta o trabalho realizado para aplicar a WordNet MCR ao domínio linguístico e discute as situações problemáticas geradas pela estrutura WordNet e pelas características inerentes ao domínio. Foi empregado o enfoque descritivo para explicar como a manutenção da estrutura original da WordNet pode afetar as extensões de um domínio específico. Nossos resultados mostram que, para poder ampliar os synsets de domínios específicos, é inevitável uma reorganização estrutural

Em Questao

Archives of the Faculty of Veterinary Medicine UFRGS

Arabic medical entity tagging using distant learning in a Multilingual Framework

Author: Cotik Viviana
Rodríguez Horacio
Vivaldi Jorge
Publication venue: The Authors. Production and hosting by Elsevier B.V. on behalf of King Saud University.
Publication date: 01/01/2017
Field of study

AbstractA semantic tagger aiming to detect relevant entities in Arabic medical documents and tagging them with their appropriate semantic class is presented. The system takes profit of a Multilingual Framework covering four languages (Arabic, English, French, and Spanish), in a way that resources available for each language can be used to improve the results of the others, this is specially important for less resourced languages as Arabic. The approach has been evaluated against Wikipedia pages of the four languages belonging to the medical domain. The core of the system is the definition of a base tagset consisting of the three most represented classes in SNOMED-CT taxonomy and the learning of a binary classifier for each semantic category in the tagset and each language, using a distant learning approach over three widely used knowledge resources, namely Wikipedia, Dbpedia, and SNOMED-CT

Elsevier - Publisher Connector

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Directory of Open Access Journals

UPF Digital Repository

TASS2018: Medical knowledge discovery by combining terminology extraction techniques with machine learning classification

Author: Rodríguez Hontoria Horacio
Vivaldi Palatresi Jorge
Publication venue: CEUR-WS.org
Publication date: 01/01/2018
Field of study

En este artículo presentamos la aproximación seguida por el equipo UPF-UPC en la tarea TASS 2018 Task 3 challenge. Nuestra aproximación puede calificarse, de acuerdo a los códigos propuestos por la organización, como H-KBS, ya que utiliza métodos basados en conocimiento y aprendizaje supervisado. El pipeline utilizado incluye: i) Un pre-proceso standard de los documentos usando Freeling (etiquetado morfosintáctico y análisis de dependencias); ii) El uso de una herramienta de etiquetado sequencial basada en CRF para completar las subtareas A (identificación de frases) y B (clasificación de frases), y iii) El abordaje de la subtarea C (extracción de relaciones semánticas) usando una aproximación híbrida que integra dos classificadores basados en Regresión Logística, y dos extractores léxicos para pares entity/entity y relaciones is-a y same-as.In this paper we present the procedure followed to complete the run submitted by the UPF-UPC team to the TASS 2018 Task 3 challenge. Such procedure may be classified, according the organization’s codes, as H-KB-S as it takes profit from a knowledge based methodology as well as some supervised methods. Our pipeline includes: i) A standard pre-process of the documents using Freeling tool suite (POS tagging and dependency parsing); ii) Use of a CRF sequence labelling tool for completing both subtasks A (key phrase identification) and B (key phrase classification), and iii) Facing the subtask C (setting semantic relationships) by using a hybrid approach that uses two Logistic Regression classifiers, followed by lexical shallow relation extractors for entity/entity pairs related by is-a and same-as relations.Peer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC

Semantic tagging and normalization of French medical entities

Author: Cotik Viviana
Rodríguez Hontoria Horacio
Vivaldi Jorge
Publication venue: CEUR-WS.org
Publication date: 01/01/2016
Field of study

In this paper we present two tools for facing task 2 in CLEF eHealth 2016. The first one is a semantic tagger aiming to detect relevant entities in French medical documents, tagging them with their appropriate semantic class and normalizing them with the Semantic Groups codes defined in the UMLS. It is based on a distant learning approach that uses several SVM classifiers that are combined to give a single result. The second tool is based on a symbolic procedure to obtain the English translation of each medical term and looks for normalization information in public accessible resources.Peer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC

Semantic tagging of French medical entities using distant learning

Author: Cotik Viviana
Rodríguez Hontoria Horacio
Vivaldi Jorge
Publication venue: CEUR-WS.org
Publication date: 01/01/2015
Field of study

In this paper we present a semantic tagger aiming to detect relevant entities in French medical documents and tagging them with their appropriate semantic class. These experiments has been carried out in the framework of CLEF2015 eHealth contest that proposes a tagset of ten classes from UMLS taxonomy. The system presented uses a set of binary classifiers, and a combination mechanisms for combining the results of the classifiers. Learning the classifiers is performed using two widely used knowledge source, one domain restricted and the other is a domain independent resource.Peer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC

Syntactic methods for negation detection in radiology reports in Spanish

Author: Cotik Viviana
Rodríguez Hontoria Horacio
Stricker Vanesa
Vivaldi Jorge
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2016
Field of study

Identification of the certainty of events is an important text mining problem. In particular, biomedical texts report medical conditions or findings that might be factual, hedged or negated. Identification of negation and its scope over a term of interest determines whether a finding is reported and is a challenging task. Not much work has been performed for Spanish in this domain. In this work we introduce different algorithms developed to determine if a term of interest is under the scope of negation in radiology reports written in Spanish. The methods include syntactic techniques based in rules derived from PoS tagging patterns, constituent tree patterns and dependency tree patterns, and an adaption of NegEx, a well known rule-based negation detection algorithm (Chapman et al., 2001a). All methods outperform a simple dictionary lookup algorithm developed as baseline. NegEx and the PoS tagging pattern method obtain the best results with 0.92 F1.Peer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC

Utilización de Wikipedia para la extracción de términos en el dominio biomédico: primeras experiencias

Author: Rodríguez Hontoria Horacio
Vivaldi Palatresi Jorge
Publication venue: Sociedad Española para el Procesamiento del Lenguaje Natural
Publication date: 01/01/2010
Field of study

Presentamos un sistema de extracción de términos que usa la Wikipedia como fuente de información semántica. El sistema ha sido probado en un corpus médico en español. Comparamos los resultados usando un módulo de un extractor de términos híbrido y un módulo equivalente que utiliza la Wikipedia. Los resultados demuestran que este recurso puede utilizarse para esta tarea.We present a term extractor that uses Wikipedia as an semantic information source. The system has been tested on a Spanish medical corpus. We compare the results obtained using a module of a hybrid term extractor and an equivalent module that use the Wikipedia. The results show that this resource may be used for this task

Repositorio Institucional de la Universidad de Alicante

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

On-line analysis of biosignals for the automation of total and specific sleep deprivation in the rat

Author: ADRIÁN OCAMPO-GARCÉS
ALEJANDRO BASSI
ENNIO A VIVALDI
IGNACIO GARRIDO
JAVIER DÍAZ
JORGE ESTRADA
Publication venue: 'SciELO Agencia Nacional de Investigacion y Desarrollo (ANID)'
Publication date
Field of study

Crossref